Indexing and Data Access Methods for Database Mining
نویسندگان
چکیده
Most of today’s techniques for data mining and association rule mining (ARM) in particular, are really “flat file mining”, since the database is typically dumped to an intermediate flat file that is input to the mining software. Previous research in integrating ARM with databases mainly looked at exploiting language (SQL) as a tool for implementing mining algorithms. In this paper we explore an alternative approach, using various data access methods and systems programming techniques to study the efficiency of mining data. We present a systematic comparison of the performance of horizontal (Apriori ) and vertical (Eclat ) ARM approaches utilizing flat-file and a range of indexed database approaches. Measurements of run time as a function of database and minimum support threshold are analyzed. Experimental profiling measures of the frequency and cost of various operations are discussed. This analysis motivated both the use of adaptive ARM techniques and the development of a simple yet novel linked block structure to support efficient transaction pruning. We also explore techniques for determining what kinds of data benefit from pruning, and when pruning is likely to help.
منابع مشابه
Physical Data Modeling for Multidimensional Access Methods
Despite the fact that the database community has proposed a vast number of indexing methods over the years, no standard physical data model has been established like it has been achieved on the conceptual and logical level. How to optimize a given data model by using various indexing methods is still the ‚trade secret‘ of the database administrators. Only recently, some approaches have been tri...
متن کاملمیزان انطباق الزامات ساختاری مجلات علوم پزشکی کشور ایران با معیارهای نمایهسازی اسکوپوس
Background and Aim: In the recent years the number of science research health journals has increased in Iran. These journals should be based on the standards and criteria required in international indexing database. The aim of this study was to determine the adaptation rate of structural requirements on the Iranian medical journals with the criteria of indexing based on Scopus indexing database...
متن کاملSet-Oriented Indexes for Data Mining Queries
One of the most popular data mining methods is frequent itemset and association rule discovery. Mined patterns are usually stored in a relational database for future use. Analyzing discovered patterns requires excessive subset search querying in large amount of database tuples. Indexes available in relational database systems are not well suited for this class of queries. In this paper we study...
متن کاملEfficient Item Set Mining Supported by IMine Index
This paper presents the IMine index, a general and compact structure which provides tight integration of item set extraction in a relational DBMS. Since no constraint is enforced during the index creation phase, IMine provides a complete representation of the original database. To reduce the I/O cost, data accessed together during the same extraction phase are clustered on the same disk block. ...
متن کاملLogs ?
Web access logs, usually stored in relational databases, are commonly used for various data mining and data analysis tasks. The tasks typically consist in searching the web access logs for event sequences that support a given sequential pattern. For large data volumes, this type of searching is extremely time consuming and is not well optimized by traditional indexing techniques. In this paper ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002